Skip to content

feat(webhook): dedicated sharded apply + plan comments (per-shard, group-by-variant)#546

Merged
morgo merged 10 commits into
block:mainfrom
morgo:morgo/sharded-apply-presentation
Jun 26, 2026
Merged

feat(webhook): dedicated sharded apply + plan comments (per-shard, group-by-variant)#546
morgo merged 10 commits into
block:mainfrom
morgo:morgo/sharded-apply-presentation

Conversation

@morgo

@morgo morgo commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

What

Gives sharded schema changes their own PR-comment layout — for both the apply and the plan comment — instead of reusing the multi-deployment one. A sharded change (from the per-shard fan-out in #535) runs one operation per (shard, table) across the shards of a single keyspace within one deployment, so the unit is the shard, not the deployment.

The bug this fixes (apply comment)

The status comment reused the multi-deployment renderer. Every per-shard operation shares the one deployment (cake), so:

  • all shards rendered as identical cake rows (Deployments: 3 halted, 1 failed), and
  • the per-deployment Details map is keyed by deployment name, so the shards collided onto one entry — the failed shard's detail was overwritten by a pending sibling's, and its error was dropped. That's why a failed sharded rollout surfaced no error.

Changes

Apply commentRenderShardedApplyComment (pkg/webhook/templates/sharded_apply.go)
Shards are grouped by change signature: uniform → one Shard | Status table + DDL once; divergent (different tables, or the same table computing to different DDL because shards drifted) → one group per change set, each tying its shards' statuses to the DDL it runs. The first failed/retrying shard's error is lifted to the top and shown in its row. failed_retryable is handled like the single-deployment renderer (error surfaced + "stop retrying" footer).

image

Plan commentwriteKeyspaceChanges (pkg/webhook/templates/plan.go)
A sharded keyspace now renders what applies where: shards grouped by change signature, uniform shows the DDL once, divergent shows one DDL block per distinct change set labelled with its shards. This needed threading the per-shard plan through apitypes.PlanResponse.Shards (it previously stopped at the namespace-level Changes) via planResponseFromProtobuildPlanCommentData.

image

Detection / projection (pkg/webhook/sharded_apply.go)
isShardedApply recognises the fan-out (work ops keyed namespace/shard/table, one deployment). Per-shard status is derived through pkg/presentation with the shard name as the operation identity, so ordering labels reference shards ("waiting for -40", "halted — -40 failed") with no change to the presentation package. formatApply{Status,Summary}Comment route sharded applies to the new renderer.

No gap change required

The per-shard data already flows from gap's Strata engine through the proto plan/apply responses (#535); only the control-plane rendering changes here.

Tests

pkg/webhook + pkg/webhook/templates + pkg/api + pkg/apitypes suites pass. Covers: sharded detection / key parsing; a failed sharded apply surfacing the error (the regression); failed_retryable handling; uniform vs divergent apply rendering; divergent vs uniform plan rendering; and per-shard plan data threading through buildPlanCommentData.

Not in this PR (follow-up)

  • VSchema / group_finalizer operations are omitted from the shard view; their outcome still shows in the aggregate headline state.
  • Multiple keyspaces in one apply: the apply layout assumes a single keyspace (the fan-out's common case).

🤖 Generated with Claude Code

…ariant)

A sharded apply fans out across the shards of one keyspace within a single
deployment, but the comment renderer reused the multi-deployment layout, whose
unit is the deployment. Every per-shard operation shares the one deployment
("cake"), so all shards rendered as identical "cake" rows, and — because the
per-deployment Details map is keyed by deployment name — the per-shard detail
collided and a failed shard's error was dropped (it showed a pending sibling's
body instead). That is why a failed rollout surfaced no error.

Add a dedicated sharded layout:
- pkg/webhook/templates/sharded_apply.go: RenderShardedApplyComment. Shards are
  grouped by change signature — a uniform apply renders one Shard|Status table
  with the DDL once; divergent shards (different tables, or the same table
  computing to different DDL because shards drifted) render one group per
  distinct change set, each tying its shards' statuses to the DDL it runs. The
  first failed shard's error is lifted to the top and shown in its row.
- pkg/webhook/sharded_apply.go: detect a sharded apply (work ops keyed
  namespace/shard/table, one deployment), and build the comment input from the
  operation rows + tasks. Per-shard status is derived through pkg/presentation
  with the shard name as the operation identity, so ordering labels reference
  shards ("waiting for -40", "halted — -40 failed") with no presentation change.
  formatApply{Status,Summary}Comment route sharded applies here.

Finalizer (VSchema) operations are out of scope for the shard view for now;
their outcome is still reflected in the aggregate headline state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@morgo morgo requested review from Kiran01bm and aparajon as code owners June 26, 2026 13:43
Copilot AI review requested due to automatic review settings June 26, 2026 13:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated PR-comment renderer for sharded applies (per-shard fan-out within a single deployment), fixing the prior multi-deployment layout’s shard-collision behavior and enabling correct per-shard status + error surfacing, including divergent shard variants.

Changes:

  • Introduces a new sharded-apply comment template that groups shards by change signature and surfaces the first shard failure prominently.
  • Adds sharded-apply detection + data projection in the webhook layer, routing status/summary comments to the new renderer.
  • Adds unit tests covering sharded key parsing/detection, sharded comment rendering (uniform/divergent), and error surfacing.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
TEMPLATES.md Updates documented template example headings/metadata formatting.
pkg/webhook/templates/sharded_apply.go New sharded-apply PR comment renderer (group-by-variant, shard-unit tables, first-failure callout).
pkg/webhook/templates/sharded_apply_test.go Tests for sharded-apply template rendering behavior (uniform/divergent, error surfacing).
pkg/webhook/sharded_apply.go Sharded-apply detection and projection into template input (per-shard status via presentation).
pkg/webhook/sharded_apply_test.go Tests for sharded key parsing/detection and sharded status comment output regression coverage.
pkg/webhook/multi_apply.go Routes sharded applies to the new sharded template for both status and terminal summary comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/webhook/templates/sharded_apply.go
Comment thread pkg/webhook/templates/sharded_apply.go
Comment thread pkg/webhook/templates/sharded_apply.go
morgo and others added 2 commits June 26, 2026 08:02
The sharded layout only recognised the terminal Failed state, so an apply that
SchemaBot is auto-retrying (failed_retryable) lost its error and its next-action
guidance. Mirror the single-deployment renderer:
- isShardFailureState covers Failed and FailedRetryable; the first-failure
  callout and the per-shard status cell both surface the error for either.
- writeShardedFooter adds the FailedRetryable case with the "stop retrying"
  command, matching writeApplyFooter.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The plan comment was keyspace-level: a divergent sharded plan (shards needing
different changes — drift, or a partially-applied canary) collapsed to one DDL
block, so an operator couldn't see what would apply where. The per-shard plan
data block#535 persists never reached the comment — apitypes.PlanResponse carried
only the namespace-level Changes.

- apitypes.PlanResponse gains Shards ([]ShardPlanResponse), and
  planResponseFromProto carries them from the proto plan response.
- KeyspaceChangeData gains per-shard Shards; buildPlanCommentData threads them.
- writeKeyspaceChanges groups a sharded keyspace's shards by change signature:
  uniform shows the DDL once; divergent renders "what applies where" — one DDL
  block per distinct change set, labelled with the shards it applies to.

No gap change needed: the per-shard data already flows from gap's Strata engine
through the proto plan response; only the control-plane rendering changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@morgo morgo changed the title feat(webhook): dedicated sharded-apply comment (per-shard, group-by-variant) feat(webhook): dedicated sharded apply + plan comments (per-shard, group-by-variant) Jun 26, 2026
@morgo morgo requested a review from Copilot June 26, 2026 14:10

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comment thread pkg/webhook/sharded_apply.go Outdated
buildShardedApplyData took only the first task's DDL for an operation. An
operation can carry more than one task for its (namespace, shard, table) when a
shard plan yields multiple statements for the same table, so this dropped
statements and corrupted the change signature used to group shards. Join every
non-empty task DDL in task order instead. Covered by
TestBuildShardedApplyData_JoinsMultiTaskDDL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comment thread pkg/webhook/sharded_apply.go
Comment thread pkg/webhook/templates/plan.go
- isShardedApply now also requires every shard work operation to share one
  namespace. A multi-keyspace apply (still one deployment) would otherwise route
  to the sharded layout and mislabel the keyspace / mix unrelated shard groups;
  it now falls back to the existing layout.
- countChanges counts a keyspace's per-shard statements when the collapsed
  namespace-level Statements are absent (keyspaceStatementCount), so a sharded
  plan whose only DDL is per-shard is not miscounted as "no changes" and gets
  correct summary counts.
- Tests for multi-keyspace fallback and per-shard-only plan counting.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Comment thread pkg/webhook/sharded_apply.go
Comment thread pkg/webhook/sharded_apply.go
Comment thread pkg/api/proto_helpers.go
Comment thread pkg/webhook/plan.go
…n apply

Apply comment: a remote failure records its error on the operation's task, and
the operator may not stamp it onto the operation row — so the sharded apply
comment showed a failed shard with no error (forcing a trip to logs).
aggregateShardState now falls back to the first task error when the operation
row carries none, so a failed shard always shows why.

Plan comment: an unsafe change confined to one shard (e.g. a column drop on a
single drifted shard) is now flagged with that shard. shardedUnsafeChanges
derives unsafe changes from the per-shard plan (grouped by table+reason → shards)
when present, since the collapsed namespace-level Changes can omit a shard's
change; the unsafe-changes warning names the shard(s).

Tests: apply falls back to task error; plan surfaces a per-shard unsafe change
scoped to its shard (both the builder and the rendered warning).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comment thread pkg/webhook/plan.go
Copilot fixes:
- parseShardOperationKey rejects keys with extra segments (strict 3-part split),
  so "ns/-40/table/extra" is no longer misclassified as shard work.
- buildShardedApplyData sorts each operation's tasks by id before joining DDL,
  so the joined DDL and the change signature are deterministic (tasks arrive
  created_at DESC). In practice an operation has one task — multiple statements
  for one table are combined into one ALTER upstream — so this is defensive.
- planResponseFromProto drops shard plans with no changes, and buildPlanCommentData
  skips per-shard entries with no DDL, so an empty shard never renders a blank
  section or inflates the change count (a shard is changing iff it has changes).

Control-plane fail-closed guard (requested):
- dispatchPendingApply refuses to dispatch a shard work operation
  ("namespace/shard/table") that resolves no target shard, with a clear error
  naming the skew, instead of sending an empty TargetShards that the data plane
  rejects opaquely as "expected exactly one target shard, got 0".

Also correct the per-shard unsafe test/preview: a same-table change is one
combined ALTER, not multiple statements (multi-statement-per-table is unsupported).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

The sharded apply/plan comments weren't represented in the generated
TEMPLATES.md catalog. Add preview renderings (pkg/webhook/templates/
preview_sharded.go) for the divergent plan, the per-shard unsafe-change plan,
and the in-progress / failed / divergent apply comments — per-shard statuses
derived through pkg/presentation so they match production — wire a
comment_sharded_all aggregate preview type, and add a "Sharded Apply" section to
update-templates.sh. Regenerated TEMPLATES.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@morgo morgo merged commit 8c88e59 into block:main Jun 26, 2026
32 checks passed
@morgo morgo deleted the morgo/sharded-apply-presentation branch June 26, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants